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FIELD OF THE INVENTION 

The present invention relates to a method for encoding a digital video sequence, 
said digital video sequence comprising some sets of Images including disparity maps, a 
disparity map being used to reconstruct one image of a set of images from a reference 
image of said set of images. The invention also relates to an encoder, said encoder 
implementing said method. 

Such a method may be used in, for example, a video communication system for 3D 
video applications within MPEG standards. 

BACKGROUND OF THE INVENTION 

A video communication system typically comprises a transmitter with an encoder 
and a receiver with a decoder. Such a system receives an Input digital video sequence, 
encodes said sequence via the encoder, transmits the encoded sequence to the receiver, 
then decodes the transmitted sequence via the decoder resulting in an output digital video 
sequence, which is the reconstructe d sequence of the Input digital video sequence. The 
receiver then displays said output digital video sequence. A 3D digital video sequence 
comprises some sets of images with objects, usually one first set of texture images* along 
with another set of Images called disparity images or disparity maps. An image comprises 
some pixels. 

Each image of the digital video signal is encoded along different general coding 
schemes, which have already been proposed within the scope of MPEG. For example, the 
MPEG2 standard referenced "Draft amendment N°3 to 13818-2 Multi-view profile- 
JTC1/SC29/WG11N1088" edited by ISO/IEC in November 1995 during the MPEG Meeting of 
Dallas (Texas), has set the basis for the encoding of different views of a same video 
sequence. The main principle are not only, as in most traditional video coding schemes, to 
use temporal and spatial redundancies within one video sequence, but also to use 
redundancies between the different points of view within a video sequence, wherein each 
point of view is an image, a left image and a right image catch respectively by a left camera 
and a right camera for example. As objects of a video sequence seen from two slightly 
different points of view do not differ very much, it is possible to predict a large part of points 
of view from reference points of view by virtue of prediction vectors also called disparity 
vectors. 

Since it is always possible to have disparity vector that are all along the same 
direction, it often supposed that there is only horizontal disparity vectors. In this case, a 
disparity vector is defined by a single value, called disparity value. The disparity map is an 
image in which to every pixel is associated a disparity value. 
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These disparity values are encoded by the encoder and transmitted to the decoder. 
A reference Image is also sent to the decoder, for example the left one. Said decoder will 
use, amongst other parameters, the disparity values to reconstruct the right image from the 
5 reference image. 

Various encoding schemes well known by the person skilled in the art exist like OCT 
based, loss less run-length coding or mesh-based can be used to encode an image. In all 
these encoding schemes, usually the disparity values are encoded on n-integers values, 
often on 8 bits data representing 256 gray levels. 
10 One inconvenient of these encoding schemes is that at the receiver side, one doesn't 

know exactly how to translate the disparity map of a texture image solely from these gray 
levels data. 

Indeed, depending on a video sequence's content, the disparity map of a texture 
image can change dramatically and hence the translation. 
15 If the video sequence contains only objects filmed at very closed distance, disparity 

mayuieed to be quite accurate, with sub&plxeLaccuracy.jQnJtoe-Contrary, If the camera 

focuses on pretty far objects, sub-pixel accuracy might be of no interest, whereas there 
might be some very large values of disparity. Finally, there might be a mixed situation, with 
different regions of interests within the scene and a need of non-linear varying set of values 
20 of disparity. 

Therefore because of this problem of translation of the disparity map of the prior art 
is that, at the receiver side, there is a often a manual tuning of the 3D display in order: 

to view correctly in 3D so that a reconstructed image is equal to, or have 
few distortions compared to the original one and so the reconstructed 
25 video sequence, and/or 

to view correctly in 3D a second 3D video sequence after a previous 3D 
video sequence, sent by 2 different broadcasters for example, if those 2 
video sequences have totally different disparity values associated. 
And, if the manual tuning has to be done very often, It will cause some great 
30 discomfort for a viewer of a 3D video sequence. 

SUMMARY OF THE INVENTION 

Accordingly, it Is an object of the invention to provide a method and an encoder for 
encoding a digital video sequence, said digital video sequence comprising some sets of 
35 Images including disparity maps, a disparity map being used to reconstruct one image of a 
set of images from a reference image of said set of images, which allow a precise translation 
of the disparity map. 
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To this end, there is provided a method comprising the steps of: 

- encoding a type of the disparity map to use for the reconstruction of an image, 
and 

- encoding the disparity map. 

In addition, there is provided an encoder comprising first encoding means adapted 
for encoding a type of the disparity map to use for the reconstruction of an image, and 
second encoding means for encoding the disparity map. 

As we will see in detail further on, by encoding the type of the disparity map, and 
more precisely by encoding the way to compute the disparity values from the 8 bits of gray 
levels, the disparity map of 3D video sequences is efficiently represented and the processing 
of disparity map on the Display side of the video chain is made automatically. 

BRTF.F niTSrRTPTT^TM f> F TWIT DRABflH GS 

Additional objects, features and advantages of the invention will become apparent 
upon reading the following detailed description and upon reference to the accompanying 
drawings in which: 

- Fig. 1 illustrates a video communication system comprising an encoder and a decoder 
according to the invention, and 

- Fig. 2 is schematic diagram of the method of encoding performed by the encoder of Fig. 
1. 



DETAILED DESCRIPTION OF THE INVENTION 

In the following description, well-known functions or constructions by the person 
skilled in the art are not described in detail since they would obscure the invention in 
unnecessary detail. 

The present invention relates to a method for encoding a digital video sequence, 
said digital video sequence comprising some sets of images, usually one first set of texture 
images along with another set of images called disparity Images or disparity maps. A 
disparity map is used to reconstruct one image of a set of texture images from a reference 
image of said set of texture images. 

Such a method may be used within a video communication system SYS for 3D video 
applications in MPEG2 or MPE64, wherein said video communication system comprises a 
transmitter TRANS, a transmission medium CH and a receiver RECEIV. Said transmitter 
TRANS and said receiver RECEIV comprise an encoder ENC and a decoder DEC respectively. 
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In order to transmit efficiently some video sequences through the transmission 
medium CH, said encoder ENC applies an encoding on a video sequence, then the encoding 
video sequence is sent to a decoder DEC, which decodes said sequence. Finally the receiver 
RECEIV displays said video sequence. 

A 3D video sequence comprises some sets of images with objects, wherein an image 
is represented by a plurality of pixels. 

One object of a video sequence seen from two slightly different points of view does 
not differ very much. Therefore, a large part of points of view are predicted from reference 
points of view by virtue of prediction vectors also called disparity vectors. 

Since it is always possible to have disparity vectors that are all along the same 
direction - by rectification of the original stereo pair according to epipolar constraints for 
example - it can be supposed that there are only horizontal disparity vectors (the common 
case of a "parallel stereo setting" of video cameras). In this case, a disparity vector is 
defined by a single value, called disparity value. In the remainder of the description, a 

disparity vector will be referred to a disparity value. Of course, th is should be in no way 

restrictive. The disparity map is an image in which to every pixel is associated a disparity 
value. 

These disparity values allows to define the shifting of a pixel of an object between a 
reference image and another image, at a time t, for example when the two said images 
represent two different points of view of a same scene of the video sequence. The two 
points of view of a scene are issued by two cameras video placed at different spots. 

In order to be efficiently coded by compression algorithms, the disparity values are 
represented by n-integers values, often on 8 bits data representing 256 gray levels. The 
main issue is that the translation between the encoded n-integer values and the disparity 
values can be of different types. 

The disparity map also relates to the depth of the objects of an image. Roughly, in 
most classic representations of 3D images, the more an object is far in a reference Image 
(its depth is big), the less the movements of said object will be apparent in the 
reconstructed image. On the contrary, the more an object is near in the reference image, 
the more the movements of said object will be apparent in the reconstructed image. 

In order to reduce the information that are transmitted via the transmission 
medium, redundancies between points of view are used. Thus, as objects seen from two 
different points of view do not differ very much, it is possible to predict one point of view 
from the other one. One point of view, the reference one, will be encoded and sent via the 
transmission medium CH to the receiver RECEIV. Said receiver RECEIV will decodes it, 



BEST AVAILABLE COPY 



(> • 

5 

reconstruct the original reference point of view and deduce the other point of view from the 
reference one thanks to the disparity vectors or values associated to said reference point of 
view. 

The encoder ENC comprises first encoding means adapted for encoding a type of a 
disparity map to use for the reconstruction of an image, and second encoding means for 
encoding the disparity map. 

The encoding of a video sequence is done as following and is illustrated by the Fig. 

2. 

In a first step 1), the type of the disparity map is encoded, wherein the type 
represents the way the disparity values are to be translated, i.e. computed. In a non- 
limitative embodiment, a flag CI encodes said type of disparity map. In a first variant mode 
of said embodiment, said flag CI is set for each image within a video sequence. In a second 
variant mode of said embodiment, said flag CI Is set for a group of images, for example in 
the header of a group of images, said header being defined in the standard MPEG2 
referenced "ISO/IEC 13818-2:2000 Information technology - Generic coding of moving 
pictures and associated audio information: Video". 

This group of images, also referred as GOP tt Group Of Pictures", would have the 
particularity of having a same disparity map representation, i.e. the disparity values are 
computed In the same manner. The type flag can be coded on 3 bits for example to 
represent the disparity map. It can also have a variable length. 

The following non-limitative representations can be applied for the disparity map: 
affine, logarithmic, polynomial, piecewise planar. 

For example, in case of an affine representation, the disparity value is computed 
with the following formula. 

Disparity value = {N-integer- Shim/ Dynamic, wherein NJnteget represents the 256 
gray levels coded on 8 bits, Shift represents the 3D stereoscopic character of an image in 
relation to a user of the video system like a television (3D image giving the Impression of 
being M In" or "our of the screen), coded on 8 bits, and Dynamic represents the depth of the 
objects amongst them, coded on 4 bits. 

In a second step 2), if the representation of the disparity map representation needs 
some parameters those parameters are also encoded. 

For example, in the case of the affine representation, the shift and the dynamic 
values are two parameters PI and P2 that are encoded. 
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In a third and last step 3), the disparity map, i.e. the gray levels, is encoded with 
general coding methods like DCT, loss less method, mesh method.... 

Preferentially, the flag(s) CI and the associated parameters PI, P2... are put before 
the encoded disparity map. They are not necessarily transmitted just before the disparity 
5 map. 

Note that a flag, and as the case may be its associated parameters PI, P2..., are 
transmitted with the associated image or group of images. 

At the decoder DEC side, the knowledge of the type flag will tell said decoder if it 
10 has to wait for additional parameters or not 

Thus, one advantage of the present invention is to tell the decoder, and therefore 
the receiver, how to use exactly the disparity representation on an image to reconstruct an 
image of a set of texture images from another one. 
15 The use of a flag allows to simply defining the type of a disparity map. Moreover it 

doesn't use too much memory contrary to the use of a table, which would attribute to each 

value of the gray levels an explanation about how to move a pixel, for example, 

Such a table has also the inconvenient of being transmitted each time the disparity 
map representation changes, that is to say a lot of bits have to be transmitted. 

20 

Another advantage of the present invention is that it improves the reconstruction of 
a point of view given a reference point of view and the associated disparity map. Indeed, 
with the flag CI and, as the case may be, with the parameters, the reconstruction of the 
reconstructed point of view is more precise and thus, the reconstructed point of view fit 
25 more the original point of view. The usage of the flag(s) to explain how the disparity map 

shall be interpreted allows consistent 3D effects to the viewer, whatever translation function 
was originally used to encode the disparity values. 

Finally, a third advantage of the present invention is that, when it comes to the 
30 reconstruction of one view given a reference view and the associated disparity map, we 

have to fill the holes corresponding to parts of the reconstructed view that are not viewed in 
the reference view. The width of these holes depends on the dynamic of disparity, therefore 
on the representation of the disparity map. If one wants to build an enhancement layer of 
images devoted to the filling of the holes in the reconstructed views, precise references to 
35 the way to compute the disparity values is now available. 

It is to be understood that the present invention is not limited to the aforementioned 
embodiments and variations and modifications may be made without departing from the 
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spirit and scope of the invention as defined in the appended claims. In the respect, the 
following closing remarks are made. 

It is to be understood that the present invention is not limited to the aforementioned 
3D video application. It can be use within any application using a system for processing a 
signal where said signal is characterized by gray levels such as a heating signal. 

It is to be understood that the method according to the present invention is not 
limited to the aforementioned implementation. 

There are numerous ways of implementing functions of the method according to the 
invention by means of items of hardware or software, or both, provided that a single item of 
hardware or software can carries out several functions. It does not exclude that an assembly 
of items of hardware or software or both carry out a function, thus forming a single function 
without modifying the method for processing the video signal in accordance with the 
invention. 

Said hardware or software items can be implemented in several manners, such as by 
means of wired electronic circuits or by means of an Integrated circuit that is suitable 
programme d respectively. The integrated circuit can be contained in a computer or in an 
encoder. In the second case, the encoder comprises first encoding means adapted for 
encoding a type of a disparity map to use for the reconstruction of an image, and second 
encoding means for encoding the disparity map, as described previously, said means being 
hardware or software items as above stated. 

The integrated circuit comprises a set of Instructions. Thus, said set of Instructions 
contained, for example, in a computer programming memory or in an encoder memory may 
cause the computer or the encoder to carry out the different steps of the decoding method. 

The set of instructions may be loaded Into the programming memory by reading a 
data carrier such as, for example, a disk. A service provider can also make the set of 
instructions available via a communication network such as, for example, the Internet. 

Any reference sign in the following claims should not be construed as limiting the 
claim. It will be obvious that the use of the verb "to comprise" and its conjugations do not 
exclude the presence of any other steps or elements besides those defined in any claim. The 
article w a" or "an" preceding an element or step does not exclude the presence of a plurality 
of such elements or steps. 
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CLAIMS 

1. A method for encoding a digital video sequence (VS), said digital video sequence 
comprising some set of images including a disparity map, said disparity map being used 
to reconstruct one image of a set of images from a reference image of said set of 
images, characterized in that it comprises the steps of: 

- encoding a type of the disparity map to use for the reconstruction of an image, and 

- encoding the disparity map. 

2. A method of processing a digital video sequence (VS) as claimed in claim 1, 
characterized in that the encoding of the type of the disparity map is done by a flag. 

3. A method of processing a digital video sequence (VS) as claimed in claim 1, 
characterized in that the encoding of the type of the disparity map is followed by a set 
of parameters. 



4. A computer program product for an encoder (ENC), comprising a set of instructions, 
which, when loaded into said encoder (ENC), causes the encoder (ENC) to carry out the 
method claimed In claims l to 3. 

5. A computer program product for a computer, comprising a set of instructions, which, 
when loaded into said computer, causes the computer to carry out the method claimed 
in claims 1 to 3. 

6. An encoder (ENC) for encoding a digital video sequence (VS), said digital video 
sequence comprising some sets of images including a disparity map, said disparity map 
being used to reconstruct one image of a set of images from a reference image of said 
set of images, characterized in that it comprises first encoding means adapted for 
encoding a type of the disparity map to use for the reconstruction of an image, and 
second encoding means for encoding the disparity map. 

7. A video communication system, which is able to receive a digital video sequence (VS), 
comprising an encoder (ENC) as claimed in claim 6 for encoding said video signal, a 
transmission channel for transmitting the encoded video signal and a decoder (DEC) for 
decoding said encoded video signal. 
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Method and encoder for encoding a digital video signal 
ABSTRACT 

The present invention relates to a method and an encoder for encoding a digital video 
sequence, said digital video sequence comprising some sets of images Including a disparity 
map, said disparity map being used to reconstruct one image of a set of images from 
another image of said set of images. It is characterized in that it comprises the steps of: 

- encoding a type of a disparity map to use for the reconstruction of an image, and 

- encoding the disparity map. 

Use: encoder in a video communication system 

Reference: Fig. 2 
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